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Abstract 

The monotone rearrrangement algorithm was introduced by 
Hardy, Littlewood and Polya as a sorting device for functions. As- 
suming that x is a monotone function and that of X 
is given, consider the monotone rearrangement x n of x n . This new 
estimator is shown to be uniformly consistent. Under suitable as- 
sumptions, pointwise limit distribution results for obtained. 
The framework is general and allows for weakly dependent and long 
range dependent stationary data. Applications in monotone density 
and regression function estimation are detailed. 

Keywords: Limit distributions, density estimation, regression function es- 
timation, dependence, monotone rearrangement. 



1 Introduction 

Assume that (U, x(ti))2 =1 , for some points ti G [0,1] (e.g. (ti = i/n)), are 
pairs of data points. The (decreasing) sorting of the points x(tj) is then an 
elementary operation and produces the new sorted sequence of pairs (ti, y(U)) 
where y = sort(x) is the sorted vector. Let # denote the counting measure 
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of a set. Then we can define the sorting y of x by 



z(s) = #{U : x(U) > s} 

y(t) = z-\t), 

where z~ x denotes the inverse of a function (if the points x{U) are not unique 
it denotes the generalized inverse). 

The "sorting" of a function {x(t),t G [0,1]} can then analogously be 
defined by the monotone rearrangement (cf. Hardy et al. [2l|). 



z( s ) = \{t e [0,1] : x(t) > s}, 
y(t) = z-\t), 



where the counting measure # has been replaced by the Lebesgue measure 
A, and z~ x denotes the generalized inverse. 

The monotone rearrangement algorithm of a set or a function has mainly 
been used as a device in analysis, see e.g. Lieb and Loss 0, Chapter 3] or 



in optimal transportation (see Villani |37|, Chapter 3]). Fougeres [15] was 



the first to use the algorithm in a statistical context, for density estimation 



under order restrictions. Meanwhile, Polonik [28|, [29j also developed tools of 
a similar kind for density estimation for multivariate data. More recently, 
several authors revisited the monotone rearrangement procedure in the esti- 
mation context under monotonicity; see Dette et al. [Ill ], and Chernozhukov 
et al. 0- 

We introduce the following two-step approach for estimating a monotone 
function. Assume that x is a monotone function on an interval I C M. As- 
sume also that we already have an estimate x n of x, but that this estimate is 
not necessarily monotone. We then propose to use the monotone rearrange- 
ment x n of estimate of x. 

Under the assumption that we have process limit distribution results for 
(a localized version of) the stochastic part of x n and that the deterministic 
part of x n is asymptotically differentiable at a fixed point to, with strictly 
negative derivative, we obtain pointwise limit distribution results for x n (to). 
The framework is general and allows for weakly dependent as well as long 
range dependent data. This is the topic for Section 3. 

Possible applications of the general results are to monotone density and 
regression function estimation, which we explore in more detail in Section 4. 



2 



These are the problems of estimating / and m respectively in 



(i) ti, . . . , t n stationary observations with marginal 

decreasing density / on R + , 
{ii) (ti, yi) observations from yi = m(ti) + 6i, 

ti = i/n,i = 1, . . . ,n,m decreasing on [0, 1], 
{ej} stationary sequence with mean zero. 

The standard approaches in these two problems have been isotonic regression 
for the regression problem, first studied by Brunk and (nonparametric) 
Maximum Likelihood estimation (NPMLE) for the density estimation prob- 
lem, first introduced by Grenander [l8[. A wide literature exists for regression 
and density estimation under order restrictions. One can refer e.g. to Muk- 
erjee 
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Ramsay [31], Mammen [23j, Hall and Huang 19], Mammen et al. 
24|, Gijbels 0, Birke and Dette (Joette and Pilz Q, Dette et al. [U for 
the regression context. Besides, see Eggermont and Lariccia [14|. Fougeres 
lj| . Hall and Kang 20], Meyer and Woodroofe 25], Polonik [2 8|. Van der 
Vaart and Van der Laan 35j, among others, for a focus on monotone (or uni- 
modal) density estimation. Anevski and Hossjer 0] gave a general approach 
unifying both contexts. 

Using kernel estimators as preliminary estimators of / and m on which the 
monotone rearrangement is then applied, we are able to derive limit distri- 
bution results for quite general dependence situations, demanding essentially 
stationarity for the underlying random parts {ti} and {e^} respectively. The 
results are however stated in a form that allows for other estimators than the 
kernel based as starting points, e.g. wavelet or splines estimators. 

The paper is organized as follows: In Section 2 we define the monotone 
rearrangement algorithm and derive some simple properties that will be used 
in the sequel. In particular our definition differs slightly from Hardy, Little- 
wood and Polya's original definition 21[; the difference is motivated by the 
fact that we will use localization and restriction. The most important prop- 
erties for the algorithm that are derived are the equivariance under addition 
of constants, the continuity of the map and a certain localization property, cf. 
LemmalU Theorem[T]and Theorem [2] below. Furthermore we state conditions 
that allow for the extension of the map to unbounded intervals. 

In Section 3 we define the generic estimator of the monotone function, 
and state the consistency and limit distribution properties for the estimator. 
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The limit distribution is given in Theorem 4 and is of the general form 

C 1 Mto) - x(*o)] ^ T (A ■ +{?(•; to)) (0) + A, 

where T is the monotone rearrangement map, A = lim n ^ 00 d~ 1 [E,{x n (tQ + 
sdn)} — ^(^o)] is the asymptotic local bias of the preliminary estimator and 
v(s; t ) = lim^oo d' 1 [x n (t + sd n ) — ¥,{x n (t + sd n )}\ is the weak local limit 
of the process part of the preliminary estimator; here d n J. is a deterministic 
sequence that is determined by the dependence structure of the data. 

In Section 4 we apply the obtained results in Section 3 to regression 
function estimation and density estimation under order restrictions, and 
derive the limit distributions for the estimators. This gives rise to some 
new universal limit random variables, such as e.g. in the regression context 
T(s + B(s))(0) with T the monotone rearrangement map and B standard 
two sided Brownian motion for independent and weakly dependent data, or 
T(s + .B 1)( g(s))(0) with Bx p fractional Brownian motion with self similarity 
parameter (3, when data are long range dependent. The rate of convergence 
d n is e.g. for the regression problem the optimal n _1//3 in the i.i.d. and 
weakly dependent data context and of a non-polynomial rate in the long 
range dependent context, similarly to previously obtained results in isotonic 
regression for long range dependent data, cf. Anevski and Hossjer (2)]. 

In the appendix we derive some useful but technical results on maximal 
bounds on the rescaled process parts in the density and regression estimation 
problems, i.e. for the local partial sum process and empirical processes, for 
weakly dependent as well as long range dependent data. 

2 The monotone rearrangement algorithm 

Consider an interval I C R, and let B(I) = {/ : /(J) bounded} and 
T>(I) — {/ : / decreasing on I}. For each Borel set A of R, denote X(A) 
the Lebesgue measure of A on M. In a first step, the monotone rearrange- 
ment will be defined for finite intervals /, and some extensions for infinite / 
will be discussed in a second step. 

2.1 Definition and properties for finite intervals 

Definition 1. Let I C R be a finite interval, and assume f G B(I). Let rfj 

be the right continuous map from f(I) to R + , called "upper level set function" 
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of f and defined for each u G /(/) by 

r fJ (u) := \{tel: f(t)>u} = X{Inr 1 (u,oo)}. 

The monotone rearrangement map Ti : B(I) 3 f i— > T/(/) G «s defined 

up to a translation as the (right continuous) generalized inverse of the upper 
level set function 

Tj(f)(t) := M{uef(I):r fJ (u)<t-MI}, (1) 

for tel. 

The following lemmas are listing some simple and useful properties of the 
maps u I— > rfj(u), f i— > r/ ; / and / i— > T/(/) respectively. 

Lemma 1. Assume I CM, is a finite interval, and f G B(I). Then 

(i) If f has no flat regions on I, i.e. \{I fl = 

for all u G /(/), then r/j is continuous, 
(ii) If there is a u G /(/) such that \{I fl / _1 (-u )} = c > then 

has a discontinuity at u of height c, 
(in) If f has a discontinuity at to e I and f is decreasing, then rfj 
admits a flat region with level t . 

Proof Assertions (i) and (ii) are both consequences of the fact that 

lim \rf i(u) —Tf i(ua)\ = lim \{t G / : max(«, « ) > f(t) > mm(u,u )} 

= Xiinr^uo)}, 

which is equal to in (i), and to c in (ii). Finally, assertion (Hi) arises from 
writing that r fJ (u) = r fJ (f(t )) = t for each u G (/(#), /(*o ))■ D 

Lemma 2. Let I C R be a finite interval, and assume f G B(l). Then 

(i) If c is a constant then rf +c j(u) = rfj(u — c),for each u G /(/) + c. 

(ii) r c fj(u) = rfj(u/c) if c> 0,/or each u G cf(I). 

(Hi) f < g r fJ < r gJ . 

(iv) Let f c (t) = f(tc). Then cr fcjI = r f j. 

(v) Let f c (t) = f(t + c). Then r fcJ = r fJ . 
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Proof (i)-(iii) follow from the definition; indeed, for each u G /(/) + c, 
r f+c,i( u ) = A{t G / : f(t) + c > u} = Tfj(u — c), and for each u G cf(I), 
r c f,i( u ) = A{t G / : cf(t) > u} = r f j(u/c) if c > 0. As for (iii), {t G I : 
/(i) > -u} C {t <E I : g(t) > u}, for each fixed tt, if / < g. Statement (iv) 
follows from r fc:I (u) = \{t G I/c : f(ct) > u} = \{s/c G I/c : /(s) > m} = 
rfj(u)/c, for each w G /(/). Statement (v) is a consequence of rf c j(u) = 
\{t G I-c : /(f+c) > w} = A{s-c G J-c : /(s) > u} = \{t G / : /(/) > u}, 
for each u G /(/). □ 



Lemma 3. Let J C K fre a finite interval and assume /, g are functions in 
B(I). The monotone rearrangement map Tj satisfies the following: 

(i) Ti(f + c) = T/(/) + c, if c is a constant; 

(ii) Tj(cf) = cTj(f), if c> is a constant; 

(in) f^g^T^KTjig); 

(iv) Let f c (t) = f(ct); then T I/c (f c )(t) = 7>(/)(ct); 

(v) Let f c (t) = fit + c); then Tj_ c (f c )(t) = Tj(f)(t + c). 

Proof Let / = [a, b]; each assertion is a consequence of its counterpart 
in Lemma 2. Let t G /; statement (i) follows from T/(/ + c)(£) = inf{w G 
f(I)+c: Tfj(u — c) < t — a] = Tj(f)(t) + c, whereas (ii) comes from 
Tj(c/)(f) = inf{w G cf(I) : r fJ (u/c) < t - a} = cT 7 (/)(t). To show (iii), 
note that f < g rfj < r g j =>- < Tj(g). Assertion (iv) follows 

from the fact that for each t G I/c, Tj/ C (f c )(t) = inf{w G /(/) : rfj(u) < 
ct — a} = Ti(f)(ct). Finally, statement (v) follows since for each t G / — c, 
T T - C (f c )(t) = mf{u G /(/) : r fJ (u) <t + c-a} = Tj(f)(t + c). □ 

The previous result implies that the map Tj is continuous, as stated in the 
following theorem. 

Theorem 1. Let || • || be an arbitrary norm on B(I). Then the map Tj is a 
contraction, i.e. \\Ti(f)—Ti(g)\ \ < \\f — g\\. In particular, Tj is a continuous 
map, i.e. for all f n , f G B(I), 

\\fn-f\\^0 =► \\Tl(f n ) -7>(/)|| -> 0, 

as n tends to infinity. 
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Proof Let f,g be functions in 13(1). Clearly g(u) — \\f — g\\ < f(u) < 
g(u) + 1 1/ — g\ |, which by Lemma 3 (i) and (Hi) implies that Tj(g)(u) — \\f — 
9\\ < Tj(f)(u) < T^W + Wf-gW, so that \Tj(f)-Tj(g)\(u) < \\f-g\\, for 
each u. Since the right hand side is independent of u, the absolute value on 
the left hand side can be replaced by the norm, which implies the statement 
of the theorem. □ 



Remark 1. One can also refer to Lieb and Loss 12B . Theorem 3.5] for a 
proof of the contraction property (the "non expansivity" property of the map 
Ti ), for the LP -norms. □ 



2.2 Extension to infinite intervals 

It is not possible to define a monotone rearrangement on an infinite interval 
/ or on M for any function (p G i3(lR). This can however be done for positive 
functions / such that for each u > 0, rj(u) := \{t G M. : f(t) > u} < +oo, 
defining in this situation 

T(f)(t):=mf{ueR + :r f (u)<t}, 

for each positive t. Such a definition is precisely the definition considered 



by Hardy et al. [211 . Chapter 10.12], and is in particular valid for densities 
/ G 13 (M) (see also Lieb and Loss jij, Chapter 3] and Fougeres flit). 

If it remains impossible to define T(ip) for any function (p for which r v (u) 
is possibly infinite for some positive u, such a definition can be given locally 
around a fixed point i 6 J 0) where Iq is a finite interval, as soon as the 
function (p satisfies the following property: 

There exists a constant M < oo and a finite interval I\ including 
Iq such that 

inf <p(t) > -M and sup <p(t) < —M, (2) 

te(inf/i,sup/ ) te(sup/i,oo) 

inf <p(t) > +M and sup <p(t) < +M. (3) 

te(-oo,infJi) te(mf/ ,sup/i) 
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Theorem 2. Let I be a finite and fixed interval, and let tp G B(M) such that 
([1J) and (0) are satisfied. Then for any finite interval J containing J 1; one 
has Tj(ip) = T h ((p) on I . 

Proof Define yi := inf{y G I\ : Vs G [inf J,y[ <f(x) > (p(y)} and z : = 
inf {x G J : <£>(x) G <£>(/o)}- If follows from those definitions, from the left part 
of ([3]) and from the continuity of <p that y± G I±, zq G Ii and < < inf -^o- 
As a consequence, one has: 

*WMz/i)} = \{t G J : > y>(2/i)} 

= - inf J + \{t G Ji n (yi, oo) : ip(t) > (f{yi)}, 

where the second equality comes from splitting J into J PI (— oo,y\) and 
Jn (yi, oo), and using the right part of ([2]). Similarly, one has 

r^hWiVi)} = Vi- inf h + \{t G A H (y u oo) : cp(t) > (p{yi)}, 

so that the following equality holds: 

r^jWiyi)} + ini J = ^{^(yi)} + inf I x . (4) 

Now, define 

V* ■= r^jiviyi)} + inf J = r^i^yx)} + inf I x . 

It follows from this definition that Tj({p)(y±) = ip(yi) = (<y9 )(?/*). Besides, 
y± < inf I . To prove this, note that Tj({p) (inf I ) < M because of the right 
parts of (T5]) and Q; so y+ < inf I will follow as soon as Tj((p)(y+) > M, since 
Tj((p) is a decreasing function. This last inequality can be proved easily by 
contradiction, using jointly that Tj((p)(y+) = y?(?/i) and the left part of ([3]). 

Finally, let us check that if both functions Tj(ip) and Ti 1 (if) cross at one 
point (say, y±), then they will coincide for each point sup Iq> x > y+: Under 
the hypothesis that they cross at y+, it is equivalent to show that for each 
— M <u< (f(yi), one gets 

r<p,j(u) + inf J = r ip j 1 (it) + inf h. (5) 

Let u G [-M, (p(yi)], and write on one hand 

r<p,j(u) = A{i G J : <p(t) > u} 

= \{teJ: <p(t) > <p{yi)} + \{teJ: (p(y 1 ) > cp(t) > u} 
= r^ji^yt)} + \{t G J n (y u oo) : ip{y x ) > <p(t) > u} 
= r^ h {(f{yi)} + inf I x - inf J 

+\{t G h H 00) : (f{y x ) > cp(t) > u}, 
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where the last equality follows from (pTJ) and the right part of (J2J). On the 
other hand, 

r v , h (u) = ^{(fiyi)} + X{t G h : (p(yi) > (p(t) > u} 

= r^i^yx)} + \{t ehn (yi, oo) : ip{y x ) > <p(t) > u}, 

so that equality (jSJ) holds, and this concludes the proof of Theorem [2j □ 

Theorem [2] implies that an extension of the definition of Tj to / = R can be 
given for any continuous function ip G B(M) such that (j2J) and ([3]) hold. In- 
deed, for any finite interval J big enough, Tj((p)(t) does not depend anymore 
on J, so that one can define, for each t G Iq: 

T(<p)(t) :=T h (<p)(t). 

A straightforward consequence of this definition is that both Lemma [3], The- 
orem [H and Theorem [2] hold for T: 

Corollary 1. Let Iq C R 6e a finite and fixed interval. Assume ip is contin- 
uous and satisfies (TJ|) and (Tjy. TTien 

(i) T satisfies Lemma O the equalities and inequalities assumed 
to hold on Iq, 

(ii) T satisfies Theorem^ with norm || • || defined on the set of 

functions on Iq, 
(Hi) Theorem^ holds with Tj replaced by T. 

3 The monotone estimation procedure 
3.1 Definition and first properties 

Let s be a function of interest (such as a density function, or a regression 
function) and assume x is non increasing. Consider an estimator x n of x 
constructed from n observations, which is not supposed to be monotone. 
Typically, x n can be an estimator based on kernel, wavelets, splines, etc. 

Definition 2. We define as a new estimator of x the monotone rearrange- 
ment of x n , namely T(x n ). This is a non increasing estimator of x. 
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Theorem 3. (i). Assume that {x n }n>i is a uniformly consistent estimator of 
x (in probability, uniformly on a compact set B C RJ. If x is non increasing, 
then {T(x n )} ra >i is a uniformly consistent estimator of x (in probability, 
uniformly on B). 

(ii). Assume that {x n } n >i is an estimator that converges in probability in 
V norm to x. Ifx is non increasing, then {T(x n )} n >i converges in probability 
in L p norm to x. 

Proof Both (z) and (ii) follow from the fact that ||x|| = sup t(£K \x(t)\ is 
a norm, and T a contraction with respect to || ■ ||, by Theorem 1. Moreover 
T{x) — x if x is non increasing. □ 



Remark 2. The strong convergence in If -norm ofT(f n ) to f, as a conse- 
quence of the corresponding result for f n , was first established in Fougeres 
\la . Theorem 5] in the case when f n is the kernel estimator of a density 
function f . Chernozhukov et al. 0/ give a refinement of the non expansivity 
property, see their Proposition 1, part 2, providing a bound for the gain done 
by rearranging f n and examining the multivariate framework as well. 



3.2 Limit distribution results 

Let J C K be a finite or infinite interval, and C(J) the set of continuous 
functions on J. Let x n be a stochastic process in C(J) and let t be a fixed 
interior point in J. In this section limit distribution results for the random 
variable T(x n )(to) will be derived, where T is the monotone rearrangement 
map. The proof of these results are along the lines of Anevski and Hossjer 
0], and their notation will be used for clarity. 

Assume that {a; n } n >i is a sequence of stochastic processes in C(J) and 
write 

x„{t) = x b>n (t) +v n (t), (6) 

for t £ J. Given a sequence d n J. and an interior point t in J define J n j = 
d~ l (J — to)- Then, for s € J n ,t , h is possible to rescale the deterministic and 
stochastic parts of 

w n (s;t ) = d~ 1 {v n (t Q + sd n ) - v n (t )}, 
9n(s) = d~ 1 {x bjn (t + sd n ) -x bjn (t )}. 



10 



which decomposes the rescaling of 

d-n 1 {%n(to + sd n ) - x n (t )} = g n (s) + w n (s;t ). 

However, due to the fact that the final estimator needs to be centered at 
the estimand a; (to) and not at the preliminary estimator x n (to), it is more 
convenient to introduce the following rescaling 

v n (s;t ) = d~Vi(*o + sd n ) (7) 
= w n {s;t ) +d~ 1 v n (t ), 
g n (s) = d~ 1 {x bjn (t + sd n ) - x(t )} (8) 
= g n (s) + d~ 1 {x 6jn (t ) - x(t )}, 

so that 

Vn{s) ■= g n {s) +v n {s;t ) = d~ 1 {x n (t + sd n ) - x(t )}. (9) 

This definition of the rescaled deterministic and stochastic parts is slightly 
different from the one in Anevski and Hossjer 0], and is due to the fact 
that we only treat the case when the preliminary estimator and the final 
estimator have the same rates of convergence, in which case our definition is 
more convenient, whereas in Anevski and Hossjer [2j other possibilities occur. 

The limit distribution results will be derived using a classical two-step 
procedure, cf. e.g. Prakasa Rao |30j : A local limit distribution is first 
obtained, under Assumption 1, stating that the estimator T(x n ) converges 
weakly in a local and shrinking neighbourhood around a fixed point. Then it 
is shown, under Assumption 2, that the limit distribution of T(x n ) is entirely 
determined by its behaviour in this shrinking neighbourhood. 

Assumption 1. There exists a stochastic process v(-;to) ^ such that 

v n (-;t ) -> t5(-;*o), 

on C(— oo, oo) as n — > oo. The functions {x& in }n>i are monotone and there 
are constants A < and A 6 K such that for each c > 0, 

sup \g n (s) - {As + A) | -> 0, (10) 

as n — > oo. 
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In the applications typically 

A = lim Mfl = x '( to ) ; 

n—roo S 

A = lim d~ l {x b>n (t ) -x(t )}, 

n— >oo 

so that A is the local asymptotic linear term and A is the local asymptotic 
bias, both properly normalized, of the preliminary estimator x n . 
Define the (limit) function 

y(s) = As + A + v(s;t ). (11) 

Let {z n } be an arbitrary sequence of stochastic processes. 

Assumption 2. Let 1$ be a given compact interval and 5 > 0. There exists 
a positive constant c such that [— c, c] D Io and a finite positive M such that 

liminfpi inf z n (s) > —M, sup z n {s) < -M \ > 1-5,(12) 

n— oo I se(-c,sup7 ) se(c,oo) 



and 



lim inf P ^ inf z n (s) > +M, sup z n (s) < +M } > 1-5.(13) 

se(-oo-c) sG(inf/ ,c) 



n^oo 



Denote T c = T[_ C)C ] and T Cjn = T [k) _ cdnM+cdn] . The truncation result 
Theorem 2 has a probabilistic counterpart in the following. 

Lemma 4. Let {z n } satisfy Assumption 2. Let I be a finite interval in K.. 
Then for each compact interval J D [— c, c] 

lim limsupP(sup \T c (z n )(-) - Tj(z n )(-)\ = 0) = 1. 

c^oo n^oo I 

Proof Let A n and P n be the sets for which the probabilities are bounded 
in (fT2j) and ({TBI) , respectively. Then, using Theorem [2] with Ji = [— c, c] and 
J = I, it follows that A n fl B n C {sup 7 |T c (-2 n ) — Xj(z n )| = 0} for each com- 
pact interval J D [-c, c]. Since ([T2D and (TRJ) imply P(A n n P n ) > 1 - 25 it 
follows that limsup n ^ 00 P(sup 7 |T c (z n )(-) — Tj(z n )(-)\ = 0) > 1 — 25. Since 
5 > is arbitrary, taking the limit as c — > oo of the left hand side of this 
expression implies the statement of the lemma. □ 

Note that the previous lemma holds with Tj replaced by T/ n for an arbi- 
trary sequence of intervals I n growing to IR. 
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Theorem 4. Let J C R be an interval, and to be a fixed point belonging 
to the interior of J. Suppose Assumption 1 holds. Assume moreover that 
Assumption 2 holds for both {y n } o,nd y. Then 

d- l [Tj{x n ){t ) - ar(to)] ^ T[A ■ +v( ■ ;t„)](0) + A, (14) 

as n — ► oo. 

Proof Let c > be fixed. We have 

+d~ 1 {T Cjn (x n )(t ) - x(t )}- (15) 

Let us first consider the second term of the right hand side of ( JToT) and 
introduce 

Xn{s) ■= x n (to + sd n )=x(t ) + d n y n (s). (16) 
Applying Lemma [3] (i) and (ii) leads to 

r c ,„(x„)(to + sd n ) = T c (xn){s) = d n T c (y n )(s) + x(t Q ), 
which gives 

d- l {T c , n {x n ){t )-x{t )} = T c {y n )(0). 

Assumption 1 implies that y n — > y on C[— c, c], with y defined in (TTTT) . Ap- 
plying the continuous mapping theorem on T c , cf. Theorem HJ proves 

dn-^TcnixnXto) - x{t )} -4 T c (y)(0) 

as n — > oo. Lemma H] via Assumption 2 with z n = y implies 

T c (y)(0)-T(y)(0) 4 

as c — > oo. 

Next we consider the first term of the right hand side of (TL5]) . Let V be 
a positive and finite constant and denote A n y = [t — Vd n , t + Vd n }. From 
( TIB"]) and Lemma [3] (z, ii) it follows that 

sup d~ l \T c ^ n (x n )(-) -Tj(x n )(-) | = sup \T c (y n )(-) -Tj (y n )(-) I, 

A„, v [-V,V] 
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with y n as defined in (JH]). If J = R (resp. J ^ R), Lemma H (resp. note 
following Lemma H]) can be used with / = [—V, V] to obtain 

d~ 1 {T Cin (ar ri )(to) -Tj(x n )(t )} ^ 

if we first let n — > oo and then let c — > oo. 

Letting first n and then c tend to infinity in ffl5l) . applying Slutsky's the- 
orem and Lemma 3 (i) finishes the proof. □ 



Remark 3. The approach for deriving the limit distributions is similar to 
the general approach in Anevski and Hossjer J|J/ with a preliminary estima- 
tor that is made monotone via the L 2 -projection on the space of monotone 
functions. There are however a few differences: 

— Anevski and Hossjer look at rescaling of an integrated preliminary estima- 
tor of the monotone functions, whereas we rescale the estimator directly. Our 
approach puts a stronger assumption on the asymptotic properties of the pre- 
liminary estimator, which is however traded off against weaker conditions on 
the map T, since we only have to assume that the map T is continuous; had 
we dealt with rescaling as in Anevski and Hossjer we would have had to prove 
that the composition ^(f) (with T defined by f(F)(t) = f*T(F')(u) du) is 
a continuous map, which is generally not true for T equal to the monotone 
rearrangement map; it is however true, under certain conditions, for T equal 
to the least concave minorant map (when T becomes the L 2 -projection on the 
space of monotone functions) , cf. Proposition 2 in Anevski and Hossjer 

— Furthermore, we are able to do rescaling for the preliminary estimator di- 
rectly since it is a smooth function. On the contrary, for some of the cases 
treated in Anevski and Hossjer this is not possible, e.g. for the isotonic re- 
gression and the NPMLE of a monotone density the rescaled stochastic part 
is asymptotically white noise. As a consequence our rescaled deterministic 
function is assumed to be approximated by a linear function, whereas the 
rescaled deterministic function in Anevski and Hossjer |1/ is assumed to be 
approximated by a convex or concave function. 

— Finally, the rescaling is here centered at x(to), and not at x n (to), which 
makes it more convenient to apply the limit distribution result we get. □ 
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4 Applications to nonparametric inference prob- 
lems 

In this section we present estimators of a monotone density function and 
a monotone regression function. Limit distributions for estimators of a 
marginal decreasing density / for stationary weakly dependent data with 
marginal density / as well as of a monotone regression function m with sta- 
tionary errors, that are weakly or strongly dependent, will be derived. 

For the density estimation problem let {tj}^ denote a stationary pro- 
cess with marginal density function /. Define the empirical distribution 
function F n (t) = ^ Y17=i ^-{U<t} an d the centered empirical process F®(t) = 
n ^2i=i0-{ti<t} — F(t))- Consider a sequence 5 n such that 8 n [ 0,n5 n | oo as 
n — > oo, and define the centered empirical process locally around t on scale 
5„ 



as 



where 



w n , Sn (s;t ) = a-} n n{F°(t + s5 n )-F^t )} 

n 

1=1 

-F(t + s5 n ) + F(t )), 
<s n = Vzr[n{F°(t + 5 n )-F°(t )}] 



(17) 



Var 



o<U<t +5„} 

-F(t + 5 n ) + F(t )} 



i=i 



For the regression function estimation problem let {ti}iZ-oo be a station- 
ary sequence of random variables with E(ei) = and Var(ej) = a 2 < oo. Let 
of t =Var(^™ =1 ej). The two sided partial sum process w n is defined by 



w n (U + Yn ) 



a n \ 2 + 2^j=l 



■j=i+l 



0,1,2,..., 
i = -1,-2, 



and linearly interpolated between these points. Note that w n G C(R). 

Let Cov(fc) = denote the covariance function of a generic sta- 

tionary sequence {&}, and distinguish between three cases (of which [a] is a 
special case of [b].) 
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[a] Independence: the q are independent. 

[b] Weak dependence: J2 k |Cov(A;)| < oo. 

[c] Strong (long range) dependence: J2k \Cov(k)\ = oo. 

Weak dependence can be further formalized using mixing conditions as 
follows: Define two a-algebras of a sequence as jF fc = : % < k} and 
Fk — : i > k}, where <r{£j : % e /} denotes the a— algebra generated 
by {C,i '■ i £ I}- The stationary sequence {^} is said to be "0-mixing" or 
"a-mixing" respectively if there is a function <f>(n) or a(n) — > as n — > oo, 
such that 

sup |P(A|^ )-P(A)| < 0(n), 

AG.? 7 ™ 

sup < a(n), (18) 

AeT ,BeT n 

respectively. 

Long range dependence is usually formalized using subordination or as- 
suming the processes are linear; we will treat only (Gaussian) subordination. 

All limit distribution results stated will be for processes in C(— 00,00) 
with the uniform metric on compact intervals and the Borel a-algebra. 



4.1 Monotone regression function estimation 

In this section we introduce an estimator of a monotone regression function. 
We derive consistency and limit distributions, under general dependence as- 
sumptions. 

Assume m is a C 1 -function on a compact interval JcS, say J = [0, 1] 
for simplicity; let (yi, ti), i — 1, • • • , n be pairs of data satisfying 

yi = m(U) + €i, (19) 

where ti — i/n. 

Define y n : [1/n, 1] 1— > R by linear interpolation of the points {(ti,yi)}? = i, 
and let 

x n (t) = h- 1 I k((t-u)/h)y n (u)du, (20) 
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be the Gasser-Muller kernel estimate of m(t), cf. Gasser and Miiller 16 
where A; is a density in L 2 (R) with compact support, for simplicity take 
supp(A;) = [—1,1]. Let h be the bandwidth, for which we assume that h — > 
0, nh — > oo. 

To define a monotone estimator of m, we put 

rh(t) = T m (x n )(t), teJ, (21) 

where T is the monotone rearrangement map. A straightforward applica- 
tion of Theorem [3] and standard consistency results for regression function 
estimators imply the following consistency result: 

Proposition 1. The random function m defined by l[21\) is a uniformly con- 
sistent estimator of m in probability uniformly on compact sets, and in prob- 
ability in L p norm. 

Clearly x n {t) = Xb, n {t) + v n (t), with 

Xb,n{t) = h 1 j H—^—^niu) du, (22) 

/t — u 
k(—^—)e n (u) du, 

where the functions rh n and e n are obtained by linear interpolation of 
{(ti,m(ti))}f =1 and {(tj,ej)}™ =1 respectively. For the deterministic term 
%b,n{t) — * Xb(t) = m(t), as n — > oo. Note that rh n , and thus also Xb >n , is 
monotone. 
Put 

w n {t) = — / e n (u)du. (23) 

Since supp(fc) = [—1, 1] and if t e (l/n+ h,l — h), from a partial integration 
and change of variable we obtain 



v n (t) = —j- / k'(u)w n (t — uh) du. 
nh J 



It can be shown that w n and w n are asymptotically equivalent for all depen- 
dence structures treated in this paper. Let us now recall how the two sided 
partial sum process behaves in the different cases of dependence we consider: 
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[a] When the are independent, we have the classical Donsker theorem, 
cf. Billingsley implying that 

w n -4 B, (24) 

as n — ► oo, with B a two sided standard Brownian motion on C(R). 

[b] Define 

oo 

K 2 = Cov(O) + 2^Cov(£;). (25) 
fc=i 

Assumption 3. [(f) — mixing] Assume {ej}.j e z is a stationary (p-mixing se- 
quence with Eei = and Ee 2 < oo. Assume further YlT=i 'P(k) 1 ^ 2 < oo and 
k 2 > in (USD. 

Assumption 4. [a — mixing] Assume {ej}j £ z a stationary a-mixing se- 
quence with Eei = and Eef < oo, k 2 > in ( p?5|) and Xlfcli cK^) 1 ^ 2-6 < oo, 
for some e > 0. 

Assumption 3 or 4 imply that a 2 — > k 2 and that Donsker's result (I24p is 
valid, cf. Anevski and Hossjer and references therein. 

[c] We model long range dependent data {ej}j>i using Gaussian subordi- 
nation: More precisely, we write e, = g(£i) with {£i}i<=z a stationary Gaussian 
process with mean zero and covariance function Cov(fc) = i? such that 
Cov(0) = 1 and Cov(A;) = k~ d l (k), with l a slowly varying function at in- 
finity^] and < d < 1 fixed. Furthermore g : R i— ► R is a measurable function 
with ^{^(^i) 2 } < oo. An expansion g(£i) in Hermite polynomials is available 

k=r 

where equality holds as a limit in L 2 (ip), with <p the standard Gaussian 
density function. The functions hk(t) = t~~ k (d/dt) k (t h e~ t ) are the Hermite 
polynomials of order k, the functions 

rjk = E {g(£i)hk{£i)} = / g(u)hk(u)<p(u) du, 



1 i.e. lo(tk)/lo(t) — > 1 as t — > oo for each positive k. 
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are the L 2 ((^)-projections on hk, and r is the index of the first non-zero 
coefficient in the expansion. Assuming that < dr < 1, the subordinated 



sequence {€j}i>i exhibits long range dependence (see e.g. Taqqu |33|, l34fl). 
and Taqqu 33j[ also shows that 



in D[0, 1] equipped with the Skorokhod topology, with variance a 2 = Var 
{Eii 9(6)} = ^-""dWtl + o(l)), where 

''<*> = f !(l-4(2-V °W r - < 26) 

The limit process z rj/ g is in C[0, 1] a.s., and is self similar with parameter 

P = l-rd/2. (27) 

The process is fractional Brownian motion, z 2 ^{t) is the Rosenblatt 

process, and the processes z r> p(t) are all non-Gaussian for r > 2, cf. Taqqu 
|33t ] - From these results follows a two sided version of Taqqu's result stating 
the behavior of the two sided partial sum process: 

w n -4 B TtP , (28) 

in D(— oo,oo), as n — * oo, where 5 rj/ 3 are the two sided versions of the 
processes z r $. 

In the sequel, rescaling is done at the bandwidth rate, so that d n = h. 
For s > 0, let consider the following rescaled process: 



v n (s;t) = rf n 1 (n/i) 1 o fl J Wn(h 1 t + s — u)k'{u) du 

= d~ l (nh)~ l <Jh j Wfi{s — u)k'(u) du, (29) 



with h = [nh] the integer part of nh, where the last equality holds due to 
the stationarity (exactly only for t = ti and asymptotically otherwise). Note 
that the right hand side holds also for s < 0. 

With the bandwidth choice d n = hwe obtain a non-trivial limit process v; 
choosing d n such that d n /h — ► leads to a limit "process" equal to a random 
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variable and d n /h — > oo to white noise. In the first case the limit distribution 
of T(x n ) on the scale d n will be the constant 0, while in the second case it 
will (formally) be T(m'(to) ■ +t>(-))(0) which is not defined (T can not be 
defined for generalized functions, in the sense of L. Schwartz (32J). 

Theorem 5. Assume m is monotone on [0, 1] and for some open interval 
It 3 to, m G C l (I to ) and sup teIt m'(t) < with t e (0,1). Let x n be 
the kernel estimate of m defined in ( TJSj) , with a non-negative and compactly 
supported kernel k such that k' is bounded, and with bandwidth h specified 
below. Suppose that one of the following conditions holds. 

[a] {e{\ are independent and identically distributed, Eei = 0; 
a 2 = Var(ei) < oo, and h = an^ 1 ^ 3 , for an arbitrary a > 0, 

[b] Assumption 3 or 4 holds, a 2 , = Var(^™ =1 6j) ; k 2 is defined in (p?5l) . 
and h = an~ 1//3 ; with a > an arbitrary constant, 

[c] 6j = g(£i) is a long range dependent subordinated Gaussian sequence 
with parameters d and r,h = kin; a)n~ rd ^ 2+rdS> with a > and 

n i — > / 2 (n; a) is a slowly varying function defined in the proof below. 

Then, correspondingly, we obtain 

h- l {m(t ) -m(t )} -4 T[m\t ) ■ +£(•; *o)](0) +m'(t ) J uk(u) du, 
as n — > oo ; where fa is defined in (G2P, 

v(s;t) = c J w{s — u)k'{u) du, (30) 

and respectively 

[a] w = B ; c = aa~ 3/2 , 

[b] w = B ; c = kgT 3/2 , 

[c] w = B r fi ; c = \r] r \a (where [3 defined in p??D). 

Proof Theorem O is an application of Theorem H] in the context of mono- 
tone regression function. Assume first that d n = h is such that 

d-^n/O'Vfi = d- 2 n^V A -> c> 0. (31) 
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Then w n — > in -D(— oo, oo), using the supnorm over compact intervals 
metric, under the respective assumptions in [a], [b] and [c]. Besides, note 
that if k! is bounded and k has compact support, the map 

C{— oo, oo) 3 z{s) i— > j z{s — u)k'{u) du E C{— oo, oo) 

is continuous, in the supnorm over compact intervals metric. Thus, under the 
assumptions that k' is bounded and k has compact support, the continuous 
mapping theorem implies that 

v n {s;t) ^ v(s;t), (32) 

where v(s;t) is defined in ( 1301) . This yields the first part of Assumption 1. 
Furthermore 



g n {s) = h 1 j £(u)m n (t Q — hu) du 

= h~ l I £(u)m(to — hu) du + r n (s), 



with £(v) = k(v + s) — k(v) and r n a remainder term. Since 



it follows by a Taylor expansion of m around to that the first term converges 
towards As, with A = m'(to). The remainder term is bounded for any c > 

as 

sup | r n (s) | < h^ 1 sup / |^(u)|dit sup \fh n (u) — m(u)\ 

\s\<c \s\<cJ \u-t \<(c+l)h 

= 0(n- 1 h- 1 ) = o(l). 

Furthermore 

d~ 1 {a: 6 , ri (to) -m(t )} -> m'(t ) J uk(u)du=: A, (33) 
as n — >• oo, which proves Assumption 1. 
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Proof that Assumption 2 holds is relegated to the appendix, see Corollary 
[2] in Appendix IA.1I An application of Theorem H] then finishes the proof of 
Theorem 3J It only remains to check whether <i~ 1 (n/i)~ 1 cr^ — > c > for the 
three types of dependence. 

- Independent case [a]: We have o~\ = a 2 nd n . Thus (i~ 1 (n/i)" 1 o"^ = 
cm -1 / 2 /! -3 / 2 , and ( 13T1) is satisfied with c = cra" 3//2 if d n = h — an~ l l 3 . 



- Mixing case [b] : The proof is similar to the proof of [a] , replacing a by 

K. 

- Long range data case [c]: Since o\ = T)f(nd n ) 2 ~ rd li(nd n ) , if we choose 
d n = h we will have 

d- 2 n- l a h = d- 2 n- l \r l ^nd n ) l - rd /H l {nd n ) l l 2 -> \ Vr \a (34) 

if and only if 

dn = n- rd ^ 2+rd h 2 (n;a), (35) 

where I2 is another function slowly varying at infinity, implicitly defined 
in ( 134"1) . Thus ( 13T1) follows with c = |r/ r |a and h = d n given in ( 1331) . 
□ 

Remark 4. The present estimator is similar to the estimator first presented 



by Mammen 123 1: Mammen proposed to do isotonic regression of a kernel 
estimator of a regression function (using bandwidth h = n~ 1 ^), whereas 
we do monotone rearrangement of a kernel estimator. Mammen's estimator 
was extended to dependent data and other bandwidth choices by Anevski and 
Hossjer f^J who derived limit distributions for weak dependent and long range 
dependent data that are analogous to our results; for the independent data 
case and bandwidth choice h = n^ 1 ^ 3 the limit distributions are similar with 
rate of convergence n 1 ^ 3 and nonlinear maps of Gaussian processes. 

4.2 Monotone density estimation 

In this subsection we introduce a (monotone) estimator of a monotone den- 
sity function for stationary data, for which we derive consistency and limit 
distributions. 
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Let ti,tz, ■ ■ ■ denote a stationary process with marginal density function 
/ lying in the class of decreasing density functions on M + , and define the fol- 
lowing estimator of the marginal decreasing density for the sequence {tj}i>i: 
Consider x n (t) = {nh)~ l Ym=i k{(t — U)/h} the kernel estimator of the den- 
sity /, with k a bounded density function supported on [—1, 1] such that 
f k'(u)du = 0, and h > the bandwidth (cf. e.g. Wand and Jones [38j]), and 
define the (monotone) density estimate 

f n (t) = T(x n )(t), (36) 

where T is the monotone rearrangement map. Note that f n is monotone and 
positive, and integrates to one, cf. equation (4) of Section 3.3. in Lieb and 
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A straightforward consequence of Theorem [3] and standard convergence 
results for the kernel density estimate is the following consistency result: 



Proposition 2. The random function f n defined by [3b]) is a uniformly con- 
sistent estimator of f in probability uniformly on compact sets, and in prob- 
ability in L p norm. 

In the following, the limit distributions for f n in the independent and 
weakly dependent cases are derived. We will in particular make use of recent 
results on the weak convergence w nt s n — > w, on D(— oo, oo), as n — ► oo, for 
independent and weakly dependent data {tj}, derived in Anevski and Hossjer 
|. 

The kernel estimator can be written x n = Xb n + v n with 



x n (t) = h 1 J k'(u)F n (t — hu) du, 
x b>n (t) = h- 1 J k'{u)F{t-hu)du, (37) 
Vn (t) = h- 1 / k'(u)F°(t - hu) du. 



Rescaling is done on a scale d n that is of the same asymptotic order as h, so 
that we put d n = h. The rescaled process is 



v n {s;t ) = c n I k'(u)w n4n (s - u;t Q ) du, 
with c n = d^inh^a^. 
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Theorem 6. Let {i;};>i be a stationary sequence with a monotone marginal 
density function f such that sup t6itQ f'[t) < and f E C l (I ta ) for an open 
interval I to 3 t Q where t > 0. Assume that ~Etf < oo. Let x n be the kernel 
density function defined above, with k a bounded and compactly supported 
density such that kl is bounded. Suppose that one of the following conditions 
holds: 

[a] {ti\i>\ is an i.i.d. sequence, 

[b] 1) {U}i>i is a stationary 4>-mixing sequence with 5^^i0 1 ^ 2 (O < oo ; 

2) f(to) = F'(to) exists, as well as the joint density fk(si,S2) of 
(tii t\+k) on [to — 8,t + 5} 2 for some 5 > 0, and k > 1 ; 

oo 

3) y^Mfc < oo holds, for M k = sup |/jfc(si, s 2 ) - /(si)/(s 2 )|- 

t_i to— S<si,S2<to+S 

Then choosing h = an~ 1//3 and a > an arbitrary constant, we obtain 
n 1/3 {fn(t )-f(t )} S aT[f'(to)-+v(-,to)}(0) + f'(to)a[uk(u)du, 



as n — > oo, where v(s;t) is as in (j^gj) . with c = a 3//2 /(to) 1//2 ; and w a 
standard two sided Brownian motion. 

Proof If k' is bounded and k has compact support, the continuity of the 
map 

C(— oo, oo) 3 z(s) i— > J z(s — u)k'(u) du E C(— oo, oo) 

implies that, choosing d n such that c n — > c as n —>■ oo for some constant c, 
one gets: 

v n (s;to) —> c J k'(u)w(s — u;to) du =: v(s;to), (38) 

on C(— oo, oo), as n — > oo, thanks to the continuous mapping theorem. Here 
w is the weak limit of {w n }. Theorems 7 and 8 of Anevski and Hossjer 0] 

state that w nt s n {s,to) — > B(s) on D(— oo, oo) under the respective assump- 
tions in [a] and [b], where B(s) is a two sided standard Brownian motion. 
This establishes the first part of Assumption 1 for both cases [a] and [b]. 
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Next notice that x^ n (t) = h 1 f k(K^)f(u) du is monotone. A change of 
variable and a Taylor expansion in Xb, n prove the second part of Assumption 
1 with A = /'(t ) an d 

CKn(*o) - /(*o)} -> /'(*o) y uk{u) du = A. 

The statement of Assumption 2 is relegated to the appendix, see Corol- 
lary [3] in Appendix IA.2I Theorem [6] therefore holds as an application of 
Theorem HI 

Let us finally check that the scale d n can be chosen so that 
assumed at the beginning of the proof: 

- Independent data case [a]: We have o\ dn ~ nd n f(t ), so that 

d-\nh)^a n4n ~ d-^n-^f(t )^. 
Choosing d n = an' 1 / 3 we get c = a _3//2 /(t ) 1 / 2 . 

- Mixing data case [b] : Similar to the proof of case [a] . 



Remark 5. The present estimator was first proposed for independent data 
by Fougeres flA] . who stated the strong consistency uniformly over M + for 
T(/ n ) and derived some partial results for the limit distribution. The results 
for the monotone density function estimator are similar to the results for 
the Grenander estimator (the NPMLE) of a monotone density, in that we 
have cube root asymptotics and a limit random variable that is a nonlinear 
functional of a Gaussian process, for independent and weak dependent data; 
see Prakasa Rao fsdi] and Wright fsf^ l for the independent data cases, and 
Anevski and Hossjer f^J for the weak dependent data cases. In our case 
however we obtain one extra term that arises from the bias in the kernel 
estimator. Our estimator is really closer in spirit to the estimator obtained 
by projecting the kernel estimator on the space of monotone functions (i.e. 
kernel estimation followed by isotonic regression) first proposed by Anevski 
and Hossjer jl// note that we obtain the same bias term as in Anevski and 
Hossjer Jj]/. 
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Remark 6. The results for the long range dependence case is similar to the 
result for the isotonic regression of a kernel estimator, cf. Anevski and Hoss- 
jer jl/. In this situation v n (s; t ) is asymptotically a linear function of s with 
a random slope, implying that the monotone rearrangement of g n + v n is just 
9n + v n which evaluated at zero is zero. This is due to the fact that for long 
range dependent data the limit process of the empirical process is a determin- 
istic function multiplied by a random variable, cf. the remark after Theorem 
12 in Anevski and Hossjer ]&]. Thus the limit distribution for the final esti- 
mator for long range dependent data is the same as the limit distribution for 
the kernel estimator itself , i.e. n d ^ 2 {f n (t) — f(t)} and n d l 2 \j n {t) — f(t)} have 
the same distributional limit. See Csorgo and Mielniczuk |2/ for a derivation 
of this limit distribution. 



5 Conclusions 

We considered the feature of estimating an arbitrary monotone function x, via 
a monotone rearrangement of a "preliminary" estimator x n of the unknown 
x. We derived consistency and limit distribution results for the monotonized 
estimator that hold under rather general dependence assumptions. 

Our approach is similar in spirit to the general methods studied in Anevski 
and Hossj er M and first introduced in the regression estimation setting by 



3rl2j 
2aF 



Mammen [23] : Start with a preliminary estimator and make it monotone by 
projecting it on the space of monotone functions. The present approach can 
however at some point be considered preferable: The monotone rearrange- 
ment, being basically a sorting, is a simpler procedure than an L 2 -projection. 
Furthermore the consistency and limit distribution results indicate similar 
properties to Mammen's and Anevski and Hossjer's estimators. Besides, an 
important advantage of our estimator is the finite sample behavior: Mam- 
men's estimator is monotone but not necessarily smooth; Mammen actually 
studied two approaches, one with kernel smoothing followed by monotoniza- 
tion and the other approach the other way around, i.e. monotonization 
followed by kernel smoothing. Mammen showed that the two proposals 
are first-order equivalent. However, their finite sample size properties are 
very different: the first resulting estimator is monotone but not necessarily 
smooth, while the other is smooth but not necessarily monotone, so that 
one needs to choose which property is more important. This is not the case 
with our estimator, since if we start with a smooth estimator of the function, 
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e.g. a kernel estimator, the monotone rearrangement will be smooth as well. 
This can however become a disadvantage for instance when the estimand is 
discontinuous: then the monotone rearrangement will "oversmooth" since it 
will give a continuous result, while Mammen's estimator will keep more of 
the discontinuity intact. 

Some simulation studies are available in the literature, which exhibit the 
small sample size behavior of the rearrangement of a kernel estimator of a 
density, and compare it to different competitors. See e.g. Fougeres [15 
Meyer and Woodroofe 25j], Hall and Kang 20], Chernozhukov et al. 



These references deal with independent data. A larger panel of dependence 
situations in the comparisons would clearly be of interest, and this will be 
the object of future work. 

Note that our results are geared towards local estimates, i.e. estimates 
that use only a subset of the data and that are usually estimators of esti- 
mands that can be expressed as non-differentiable maps of the distribution 
function such as e.g. density functions, regression functions, or spectral den- 
sity functions. This differs from global estimates, as those considered for 
example by Chernozhukov et al.Q for quantile estimation. 

An approach similar to ours for local estimates is given in Dette et al. 



using a modified version of the Hardy-Littlewood-Polya monotone rear- 
rangement: The first step consists of calculating the upper level set function 
and is identical to ours. However in the second step they use a smoothed 
version of the (generalized) inverse, which avoids nonregularity problems for 
the inverse map. The resulting estimator is therefore not rate-optimal, and 
the limit distributions are standard Gaussian due to the oversmoothing. 

Work has been done here using kernel based methods for the prelimi- 
nary estimator x n of x. Other methods, such as wavelet based ones, are 
possible, and let emphasize that the only assumptions required are given in 
Assumptions 1 and 2. 

We have studied applications to density and regression function estima- 
tion. Other estimation problems that are possible to treat with our methods 



are e.g. spectral density estimation, considered by Anevski and Soulier 
and deconvolution, previously studied by van Es et al. [36j| and Anevski 



3], 
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A Maximal bounds for rescaled partial sum 
and empirical processes 

In this section we derive conditions under which Assumption 2 holds, for the 
density and regression function estimation cases. Recall that 

9n{s) = d~ 1 {x b>n (t + sd n ) -x bin (t )}, (39) 
v n (s) = d~ 1 v n (t + sd n ). 

Since under Assumption 1 

Un{s) - {g n {s) + v n (s)} = d~ 1 {x h>n {t Q ) - x(t )} 

- A, 

as n — > oo, and |A| < oo, establishing Assumption 2 for the process g n + v n 
implies that it holds also for the process y n = g n + v n . Therefore it is enough 
to establish Assumption 2 for y n replaced by g n + v n . 

Recall that for the cases that we cover the rescaled processes are of the 
form 

v n (s;t ) = c n J k'(u)z n (s - u;t ) du, 

with z n = w nj d n the local rescaled empirical process in the density estimation 
case and z n = w n the partial sum process in the regression case. This implies 
that for the density estimation case the support of v n is stochastic, since it 
depends on maxi<j< n ti, while for the regression estimation case it does not 
depend on the data {tj} and is as a matter of fact compact and deterministic. 

Lemma 5. Let supp(A;) C [—1, 1]. Suppose that Assumption 1 holds. Assume 
that t Q has a neighbourhood I = [t — e, t + e] such that r := sup t6/ x'{t) < 0. 
Suppose also that 

x' b , n (t + sd n ) -> x'(t), (40) 

as n — ► oo, for all t £ I. 

Then (fTgj) and (TJgj) written for z n = g n + v n are implied by the two results: 
(A). For every 5 > and < c < oo there is a finite M > 



lim inf P 

n— >oo 



M 



> 1-5. 
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(B). For every 5 > and finite M > there is a finite C such that for each 
c>C 



limsupP<( sup v n (s) > — — r{d n l t — c) ^ < 5, (41) 

where £(n) is a deterministic function which satisfies either of 
(i) liminf P{ max ti < £(n)} = 1, 

n^oo l<i<n 

or 

(m) £(n) = maxsupp(x n ) i/ limsupmaxsupp(x ri ) < X < oo. 

n— >oo 

Condition (A) can be seen as boundedness on small sets (i.e. on the 
sets (c, d~ 1 e)), while the conditions in (B) are bounds outside of small sets; 
the small sets are really compact (of the form (0, e)) on the t-scale, and are 
increasing due to the rescaling done for the s-scale. 

Condition (B)(ii) is appropriate for the regression function estimation 
case, since then limsup n ^ oc max(supp(x ra )) is bounded by l+max(supp(A;)) = 
2, while for the density estimation case we will have to invoke the more subtle 
assumptions in (B)(i). 

Proof In order to show ( fl2l) . we first prove that for each 8 > there is a 
< M < oo and a < c < oo such that 

liminf P{sup(g n + v n )(s) < —M} > 1-5. (42) 



Let g n be defined in (J39J). Consider the function 

ts, on (-erf" 1 , erf" 1 ) 
kn{s) = { red,; 1 on (e^ 1 ) 00 )) 



Then from fT40l) we obtain 



-redj on (-00,-erf^) 



9n(s) < k n (s) on R + , 
g n (s) > k n (s) on IT, 

for all large enough n, since g n is decreasing (as weighted mean of decreasing 
functions) and g n (0) = 0. 
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Let 5 be given and suppose part (A) of the assumptions is satisfied, with 
some M and arbitrary < c < oo. We will consider the hypotheses (B)(i) 
and (B)(ii) separately: 

(B)(i) Since the kernel k has support in [—1,1] one has supp(x n ) C 
(min!<j< n tj — /;,, max!<j< n tj + h). Using the rescaling t = t + sd n this 
implies that 

supp(g n ), supp(-u n ) C -t + d~ 1 (minti - h, maxt; + h) =: I n l \ 

Since t > mintj and h is positive, the supremum over all s G I n can be 
replaced by a supremum over all s G (c, d' 1 maxtj), as n tends to oo, and 
thus we need to show 



liminfP<^ sup (g n + v n )(s) < -M \ > 1-5. (43) 

Then for c > 3M/2|r|, we will have k n (c) = —3M/2. This implies that for 
c> 3M/2\t\, 

P{ sup y n (s)<-M\ > p(n s6M -i e) K( S )<^-r( S -c)} 



(c,d n max^) 



n{ sup v n (s) < — - r(d n 1 e - c)}), 

sdd^ 1 (e,maxti) 



so that 0431) follows from the two results 



liminf P\ sup u n (s) < — - r{d n l e - c) \ > 1-5, (44) 

sGdn 1 (e,maxti) 



lim inf P 

n— »oo 



M 

n se (c,<i- 1 6){^(s) < — - r(s - c)} 



> 1-5. (45) 



The relation (145!) is satisfied by assumption (A) and thus we need to treat 
. Let I be the deterministic function given in assumption (B)(i). Note 
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first that 



r I ~ / \ M / ,-1 \ 

P <( sup v n (s) < — - T(d n e - c) 



< P i sup v n (s) < r{d n 1 e — c)| max t{ < £(n) > + P i max £j > £{n) 

I j-i/ «/■ ^ 2 Ki<n I Ki<n 
(e,£(n)) J I. 

< P < sup # n (s) < — — r(d~ 1 e — c)| max tj < £(n) > + 5 

for all n > N for some N, since lim n ^oo P {rnaxi<3< n tj > £(n)} = 0. There- 
fore, for all n > N, we have 

pi sup v n (s) < — - r(rf~ 1 e - c) 

> P i SUp ?)„(s) < r (^n le ~~ c ) I max U < ( P i maX *i < f 

> [pi sup v n (s) < T{d~ x e — c)>— 5]p{ max U < £{n) 

> [P< sup v n (s) < — - r(d~ 1 e - c) \ - 5 ) P { max t { < £{n)\ . 
V W(oA*)) 2 J / ^-- n > 

Thus since lim^oo P {maxi<j< n tj < £(n)} = 1, taking complements leads to 
(144]) as soon as for c > C 

limsuppi sup v n (s) > — T(d~ l e — c)\ < 5, 

n ^°° [d-^OAn)) 2 



i.e. (E]). 

(B)(ii). It follows from the definition of K and from supp(/c) C [—1, 1] 
that supp(x n ) C (—h, K + h), so that 

supp(<? n ), supp(£ n ) C -t + d~\-h, h + K) =: . 
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Again this implies that the supremum of v n over In can be replaced by a 
supremum over all s G (c, d~ l K) and thus (|42p will follow as soon as 



liminfP<( sup + i) n )(s) < -M ) > 1-5. (46) 



For arbitrary M and c > 3M/2 we have 



P <[ sup (<?„ + v n )(s) < -M 

M 
2 

n{ sup 5 n (s) < — -r(d~ l e- c)}), 



> P ^se{c^e){Vn(s) < — - T (s - c)} 



so that (HBP follows from 



liminfpi sup t> n (s) < — - r(d n 1 e - c) \ > 1-5, (47) 
™ l-e« 1 (e,JO 2 

and (J45]), which ends the derivation for the case (ii). 
Now we prove that with M as above 



liminfP<^ inf y n (s) > —M ) > 1-5. (48) 

n— >oo I inf ii<s<sup Iq I 

Note that with M = M c corresponding to the bound for c, we have k n (c) = 
-3MJ2 and thus %„(c) < -3M c /2 < -M c - M c /2. Since g n (s) -> As on 
compact intervals, if n is large enough then we have ¥,y n (s) > As — e for each 
e > arbitrarily small. Thus for % = — M C /2A 

%n(SMj > 2" ~ C ' 

for n large enough. Finally from (j42l and by the symmetry of the distribution 
of v n around 0, we have that with M replaced by max{M c , — 2A sup I }, both 
W> and (J48J) hold, and flHj) is proven. 
Equation fT4"5]) can be proven in a similar way, which yields the lemma. □ 
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Lemma states conditions (A) and (B) as sufficient conditions for Assump- 
tion 2. To further simplify condition (B) in Lemma using Boole's in- 
equality and the stationarity of the process v n we get in both cases (f) and 



in 



P 



- r \ M fj-i 
sup v n {s) > r(a n e — c 



(49) 



M 



< d n l l{n)P I supw n (s) > — - r{d n x e - c) 
[(o,i) 2 

where £(n) is defined for hypothesis (i) and replaced by K when dealing 
with hypothesis {if). As a consequence, in Case {€) (resp. Case (ii)) the 
probability ( 1491) will converge to as soon as 



a(n) := P { supv n (s) > — - r{d n 1 e - c 
(0,1) 2 

faster than d~ 1 £{n) — > oo, i.e. that a(n) = o(d n £(n) _1 ) as n — >■ oo (resp. 
a(n) = o{d n )). The following conditions are thus respectively sufficient to 
insure that ( )49l) tends to 0, as n — > oo: 



(i) P < max tj < £(n) > — > 1 and d l £{n)a{n) — > 0, 

I l<i<n J 

(ii) d~ a(n) — > 0. 

Finally, the examination of the convergence of a(n) can be made in two 
steps via the standard partition 



a(n) < P 

+ P 



sup \v n (s) 
s ,s'e(o,i) 



Vn(0) > 



1 (M 



r{d-h - c) 



(50) 



In the sequel we will bound the two terms of the right-hand side of ( l50i) 
separately, for the density and regression estimation problems treated in this 
paper: See subsections A.l and A. 2. 
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To further simplify (A) in Lemma EJ note that 

M 

M 

D n iem(c,d^e){ SU P v n(s) < — -r(i- c)} =: A n . 

s6[i,i+l) z 

Thus, taking complements, part (A) of Lemma [5] follows as soon as for 
every S and arbitrary < c < oo there is a < M < oo such that 
limsup^^ P(A^) < 5. However, 



sup v n (s) > — - t(i - c) 

sG[«,j+l) ^ 



ieZn(c,d£ :l e) I 
^ f M 



i — c) 



where the equality follows from the stationarity of v n . In the sequel we will 
establish maximal inequalities of the form 



P < sup v n (s) >a} < Ca~ p (51) 
[se[o,i) J 

for some constant p > 1; assume for now that these are established. Then 



oo 1 

hmsupP(^) < Y. C 7W—T; 



t(i-c)))p 

c 



2 

p-1 

< 



p|r|P V M 
< 5, 

where the next to last inequality holds by an integral approximation of the 
series and the last by choosing M = M{5) > 2(C '/p6\r\ p ) 1 ^~ 1 ' . Thus as- 
sumption (A) in Lemma [5] follows from (15TT) with p > 1; inequalities of the 
form (15TT) will next be treated. 



37 



A.l Maximal bounds for the rescaled partial sum pro- 
cess 

Let k be a kernel which is bounded, piecewise different iable, with a bounded 
derivative, say < \k'\ < a. Assume that the sequence h = h n is such that 
nh —> oo. We have (see (1291)) 



v n ( s ,to) — d n l (nh) 1 <jf l I Wn(s — u)k'(u) du 



where d n — h is chosen so that d~ l (nh)~ 1 <jf l — > 1 and n = [nh]. Now vbfi 
is asymptotically equivalent to the piecewise constant partial sum process 
which we therefore will use for notational simplicity, and which we denote 
(with a slight abuse of notation) with w^. 

We show the convergence of a{n) in which v n (s) is replaced by %(s): 
this will be sufficient since 



and thus 



|^n(s)| < SU P \ w n( s — u )\ I \k'(u)\du, 
U6[-1,1] 



SUp < C SUp SUp \Wn(s — u)\ 

se(o,i) se(o,i) «6[-i,i] 

< C SUp \Wn(s)\, 

se[-i,2] 

with c = / an d since the behaviour of the process Wft on (0, 1) and 

on (—1,2) is qualitatively the same. 

Proposition 3. Let p > 2 be given and assume that the sequence {ej}j>i 
satisfies max(Ee^, Eef) < oo. Then under the assumptions of Theorem 5 

P I sup w fl (s) > M/2 - r(5h- 1 - C) ) < Ch p , 
\se(o,i) J 

where C is a finite constant. 

Proof Let a := M/2 + rc and b := —t5. In a first step, we obtain a 
majoration of 

P sup \u)h(s) — Wn(s')\ > a + bh^ 1 
\s,s'e(o,i) 
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in the 3 dependence situations listed in Theorem One has 



Wn{s)-Wn{s') = S n (s,s'), 

Of,. 



(52) 



with 



S n (s,s') 



i=[s'h]+l 



[a] If {ej} is an i.i.d. sequence the moment bound in Theorem 2.9 in 
Petrov 27] implies 



ns n ( s ,s')\ p < c (p) 



^ E|e/ + ^ E(e 4 

i=[s'rc]+l \i=[s'ft]+l 



/ 



< c' (|s-s'|n+ |s-s'| p / 2 n p / 2 ) 
where c(p) depends on p only and d = c(p) ■ max(| |ei| | 2 , E|ei| p ) 



[b] If {tj} is a stationary sequence that is a-mixing (and thus also 0- 
mixing) satisfying the mixing condition ( 1181) . then Theorem 1 in Doukhan 
T^l implies 



E\S n (s,s')\ p < max(h\s- s'\M Pte ,h p/2 \s- s'\ p/2 M^ 2 2 ) 



where M„ 



and thus 



E|S n (s,s')| p < ^'max^ls-s'l^n^ls-s'l 2 ), 

with c" = m&x((E\ei\ p+e ) p /( p+t \ (E|e;| 2+e ) 2 /( 2+e )). 

Therefore, for both independence and weak dependence cases, equation 
(12.42) of Billingsley [4] is satisfied, so that Theorem 12.2 in Billingsley 
implies 

P\ sup \wn(s) — Wn(s')\ > a + bh^ 1 

\M'6(0,1) 

I 

= P max a ■ 1 > €i > a + bh 1 

\ k£({s'n]+l,[sh]) ^— ' 



i=[s'n]+l 



< 



K'C(n) 
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where C(n) = c'(n + n p l 2 ) for i.i.d. data and C(n) = c" max(n, h p / 2 ) in the 
mixing case. Since in both cases = n 1 ^ 2 and thus n/er? = n L ~ p l 2 and 
n P//2 /a? = 1, we get the bound 



P\ sup \wn(s) - Wn(s')\ > a + bh- 1 ) < Ch p , 

if p > 2. 

[c] In the long range dependent case we have 

E{Sl) ~ r?Mn)n 2 ^ 



(53) 



with Zi as in (|26p . and according to de Haan [lOj], equation (12.42) in Billings- 
ley 0] is satisfied, with 

7 = 2, 
a = 2/3, 



2/3 



for some constant C\ > 0. Theorem 12.2 in Billingsley |4j then leads to 



P | max a fl l > a + &/i 1 < 

ke([s'h)+i,[sh]) n , f-f i 

i= Is'n +1 



K' 

-"■2,2/3 



(a + frfr- 1 ) 2 ^ 



E 



U; 



c 



(a + bh" 1 ) 2 ' 



with C = CiK' 22 g, as a? = r] 2 li(n)n 2f3 . Thus in the long range dependent 
case (1531 holds for p = 2. 

In a second step, using w n (s) — w n (s') = w n (s — s'), one can then deduce 
from (1531 that 



P 



sup w n (s) > a + bh 1 

s6(0,l) 



where C" > 0, which together with (I5B1 and via (1501 ends the proof. □ 
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Corollary 2. Suppose the assumptions of Theorem^ are satisfied; then As- 
sumption 2 holds for y n = g n + v n and for y as defined in (Q7J) in each context 
[a], [b] and [c] listed in Theorem^ 

Proof Note first that if x^ n is defined by ( 12 2 p . if m is a C 1 -function, and 
k is a kernel with compact support, then x' bn (t + sd n ) — > m'{t) for each t 
when n — > oo. Besides, a consequence of Proposition [3] is that 



Thus, condition (B)(ii) in Lemma is satisfied as soon as d~ l a\ = d^ 1 — > 0, 
which is equivalent to p > 1. Thus the existence of two moments suffices to 
get condition (B) (ii) of Lemma [5] for i.i.d., mixing and subordinated Gaus- 
sian long range dependent sequences. Condition (A) of Lemma [5] follows 
immediately from Proposition [31 Hence Lemma [5] can be applied, so that 
Assumption 2 holds for y n = g n + v n . An analogous result for y defined by 
(II ip follows easily from the stationarity and finite second moments of v n . □ 



A. 2 Maximal bounds for the rescaled empirical pro- 
cess 

The rescaled process is 



with c n = d~ 1 (n/i) _1 cr nj rf ii . Note that, similarly to the regression case, deriv- 
ing the maximal bound for the process w n ^ n implies the maximal bound for 
the process v n . 

Proposition 4. Under the assumptions of Theorem® there exists a positive 
constant C such that 



limsupP sup v n (s) < — — r{h e — c) > 1 — 5. 




P 



sup w. 
se(o,i) 



n. 




-1 
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Proof Let a := M/2 + tc and b := —re. For independent or mixing data 
satisfying the assumptions of Theorem [HI Lemma C2 in Anevski and Hossjer 
ji| implies that 



P< sup \w n4n {s;t ) - w n4n (s';t )\ > a + bh 1 
[s,s'e(o,i) 

K 

~ (a + bh' 1 ) 4 + (a + bh' 1 ) 5 (54) 

for a positive constant K. 

From (1541) one can deduce the corresponding bound for 



Pi sup \w ntdn {s]t )\>a + bh 1 

[s(0,l) 

which together with (1541) and via (1501) . implies the statement of the proposi- 
tion. □ 



Corollary 3. Suppose the assumptions of Theorem^ are satisfied; then As- 
sumption 2 holds for y n = g n + v n and for y as defined in ( fi 1\) in each context 
[a] and [b] listed in Theorem® 

Proof Note first that if Xb, n is defined by ( 1371) . if / is a C 1 -function, and 
k is a kernel with compact support, then x' bn (t + sd n ) f'(t) for each t 
when n — > oo. Besides, a consequence of Proposition H] is that 

liminf P ( sup v„(s) < r(h~ 1 e — c) ] > 1 — 5 

n -*°° yse^h- 1 maxti) 2 J 

if there exists a function £(n) such that 

P{max U < £(n)} -> 1 and h p -H(n) -> 0, (55) 

l<i<n 

as n — > oo, and with p > 5. Note that 

E|t,-| p nl 

P < max tj > tin) > < n 



i<i<n | ~ £(n)P £{n) 
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Thus the conditions in (j55p are implied by 



£(n)- p n -> 0, d-H{n)dP n -> 0, 



which is equivalent to 



n 1/p « £{n) « d l ~ p . 



In the case of i.i.d. and mixing data, when d n = n 



1/3 



p should satisfy 



1 p- 1 
- < - 

p 3 



(p 2 - p) > 3 



l + vg 

2 



This, together with the restriction p > 5 in Proposition HJ implies that the 
existence of five moments suffices to establish {B)(i) in Lemma|S]for i.i.d. and 
mixing data. Condition (A) in Lemma [5] is immediate from Proposition HI 
Hence Lemma [5] can be applied, so that Assumption 2 holds for y n = g n + v n . 
An analogous result for y defined by (TTTj) follows easily from the stationarity 
and finite second moments of v n . □ 
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